20 research outputs found

    A Feature-Based Lexicalized Tree Adjoining Grammar for Korean

    Get PDF
    This document describes an on-going project of developing a grammar of Korean, the Korean XTAG grammar, written in the TAG formalism and implemented for use with the XTAG system enriched with a Korean morphological analyzer. The Korean XTAG grammar described in this report is based on the TAG formalism (Joshi et al. (1975)), which has been extended to include lexicalization (Schabes et al. (1988)), and unification-based feature structures (Vijay-Shanker and Joshi (1991)). The document first describes the modifications that we have made to the XTAG system (The XTAG-Group (1998)) to handle rich inflectional morphology in Korean. Then various syntactic phenomena that can be currently handled are described, including adverb modification, relative clauses, complex noun phrases, auxiliary verb constructions, gerunds and adjunct clauses. The work reported here is a first step towards the development of an implemented TAG grammar for Korean, which is continuously updated with the addition of new analyses and modification of old ones

    Compound noun segmentation based on lexical data extracted from corpus

    No full text

    Dual Triggered Correspondence Topic (DTCT) model for MeSH annotation

    No full text

    Word Segmentation Based on Estimation of Words from Examples

    No full text
    From a cognitive point of view, words can be recognized based on learned data which can be obtained from linguistic materials. Namely, people learn words from many examples which they meet. We propose a word segmentation algorithm based on estimated knowledge for words acquired from both local texts being processed and POS tagged corpus. In order to show the feasibility of our model, we apply it to guessing of unknown words caused by morphological analysis failure. 1 Introduction We continuously learn words by seeing and hearing examples, and acquire new ones based on learned knowledge and new examples. We can think of recognition and segmentation of words as the cognitive process. Consider the following example Figure 1: Words can be generalized from many samples ffl !---#UX9L(hag-gyo-e 1 , to school) (a) !---#(hag) + UX9L(gyo-e) (b) !---#UX(hag-gyo) + 9L(e) (c) !---#UX9L(hag-gyo-e) It is possible to divide an eojeol 2 `!---#UX9L'(hag-gyo-e) in three ways. Human knows that the..

    New Parsing Method Using Global Association Table

    No full text
    This paper presents a new parsing method using statistical information extracted from corpus, especially for Korean. The structural ambiguities are occurred in deciding the dependency relation between words in Korean. While figuring out the correct dependency, the lexical associations play an important role in resolving the ambiguities. Our parser uses statistical cooccurrence data to compute the lexical associations. In addition, it can be shown that sentences are parsed deterministically by the global management of the association. In this paper, the global association table(GAT) is defined and the association between words is recorded in the GAT. The system is the hybrid semi-deterministic parser and is controlled not by the condition-action rule, but by the association value between phrases. Whenever the expectation of the parser fails, it chooses the alternatives using a chart to remove the backtracking. 1 Introduction The association of words takes an important role in finding o..
    corecore